AITopics | palm 2

Collaborating Authors

palm 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs Supplementary Materials Appendix Overview

Neural Information Processing SystemsOct-9-2025, 03:43:18 GMT

Appendix B provides additional implementation details, including a video SP AE variant. Appendix C includes more quantitative evaluation results. Appendix D shows more qualitative examples of model generations. Figure 1 shows an example of the dilation subsampler defined by Eq. (1). We select evenly distributed positions in each layer to form the token pyramid with monotonically increasing layer sizes.

large language model, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)

Add feedback

SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable

Zhang, Jiaxin, Li, Zhuohang, Cui, Wendi, Das, Kamalika, malin, Bradley, Kumar, Sricharan

arXiv.org Artificial IntelligenceMar-13-2025

Large language models (LLMs) have demonstrated remarkable performance, yet their diverse strengths and weaknesses prevent any single LLM from achieving dominance across all tasks. Ensembling multiple LLMs is a promising approach to generate reliable responses but conventional ensembling frameworks suffer from high computational overheads. This work introduces Scalable Consistency Ensemble (SCE), an efficient framework for ensembling LLMs by prompting consistent outputs. The SCE framework systematically evaluates and integrates outputs to produce a cohesive result through two core components: SCE-CHECK, a mechanism that gauges the consistency between response pairs via semantic equivalence; and SCE-FUSION, which adeptly merges the highest-ranked consistent responses from SCE-CHECK, to optimize collective strengths and mitigating potential weaknesses. To improve the scalability with multiple inference queries, we further propose ``{You Only Prompt Once}'' (YOPO), a novel technique that reduces the inference complexity of pairwise comparison from quadratic to constant time. We perform extensive empirical evaluations on diverse benchmark datasets to demonstrate \methodName's effectiveness. Notably, the \saccheckcomponent outperforms conventional baselines with enhanced performance and a significant reduction in computational overhead.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.10881

Country: North America > United States > Florida > Miami-Dade County > Miami (0.04)

Genre: Research Report > Promising Solution (0.54)

Industry:

Government (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Normative Evaluation of Large Language Models with Everyday Moral Dilemmas

Sachdeva, Pratik S., van Nuenen, Tom

arXiv.org Artificial IntelligenceJan-29-2025

The rapid adoption of large language models (LLMs) has spurred extensive research into their encoded moral norms and decision-making processes. Much of this research relies on prompting LLMs with survey-style questions to assess how well models are aligned with certain demographic groups, moral beliefs, or political ideologies. While informative, the adherence of these approaches to relatively superficial constructs tends to oversimplify the complexity and nuance underlying everyday moral dilemmas. We argue that auditing LLMs along more detailed axes of human interaction is of paramount importance to better assess the degree to which they may impact human beliefs and actions. To this end, we evaluate LLMs on complex, everyday moral dilemmas sourced from the "Am I the Asshole" (AITA) community on Reddit, where users seek moral judgments on everyday conflicts from other community members. We prompted seven LLMs to assign blame and provide explanations for over 10,000 AITA moral dilemmas. We then compared the LLMs' judgments and explanations to those of Redditors and to each other, aiming to uncover patterns in their moral reasoning. Our results demonstrate that large language models exhibit distinct patterns of moral judgment, varying substantially from human evaluations on the AITA subreddit. LLMs demonstrate moderate to high self-consistency but low inter-model agreement. Further analysis of model explanations reveals distinct patterns in how models invoke various moral principles. These findings highlight the complexity of implementing consistent moral reasoning in artificial systems and the need for careful evaluation of how different models approach ethical judgment. As LLMs continue to be used in roles requiring ethical decision-making such as therapists and companions, careful evaluation is crucial to mitigate potential biases and limitations.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.18081

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Ontario > Toronto (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.68)
Media > News (0.59)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Assessing Personalized AI Mentoring with Large Language Models in the Computing Field

Luo, Xiao, O'Connell, Sean, Mithun, Shamima

arXiv.org Artificial IntelligenceDec-11-2024

This paper provides an in-depth evaluation of three state-of-the-art Large Language Models (LLMs) for personalized career mentoring in the computing field, using three distinct student profiles that consider gender, race, and professional levels. We evaluated the performance of GPT-4, LLaMA 3, and Palm 2 using a zero-shot learning approach without human intervention. A quantitative evaluation was conducted through a custom natural language processing analytics pipeline to highlight the uniqueness of the responses and to identify words reflecting each student's profile, including race, gender, or professional level. The analysis of frequently used words in the responses indicates that GPT-4 offers more personalized mentoring compared to the other two LLMs. Additionally, a qualitative evaluation was performed to see if human experts reached similar conclusions. The analysis of survey responses shows that GPT-4 outperformed the other two LLMs in delivering more accurate and useful mentoring while addressing specific challenges with encouragement languages. Our work establishes a foundation for developing personalized mentoring tools based on LLMs, incorporating human mentors in the process to deliver a more impactful and tailored mentoring experience.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.0843

Country:

North America > United States > Oklahoma > Payne County > Stillwater (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)

Genre:

Instructional Material (0.88)
Research Report (0.83)
Questionnaire & Opinion Survey (0.66)

Industry:

Education > Curriculum > Subject-Specific Education (0.95)
Education > Educational Setting > Higher Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Gender Bias in Large Language Models

Döll, Michael, Döhring, Markus, Müller, Andreas

arXiv.org Artificial IntelligenceNov-14-2024

Gender bias in artificial intelligence has become an important issue, particularly in the context of language models used in communication-oriented applications. This study examines the extent to which Large Language Models (LLMs) exhibit gender bias in pronoun selection in occupational contexts. The analysis evaluates the models GPT-4, GPT-4o, PaLM 2 Text Bison and Gemini 1.0 Pro using a self-generated dataset. The jobs considered include a range of occupations, from those with a significant male presence to those with a notable female concentration, as well as jobs with a relatively equal gender distribution. Three different sentence processing methods were used to assess potential gender bias: masked tokens, unmasked sentences, and sentence completion. In addition, the LLMs suggested names of individuals in specific occupations, which were then examined for gender distribution. The results show a positive correlation between the models' pronoun choices and the gender distribution present in U.S. labor force data. Female pronouns were more often associated with female-dominated occupations, while male pronouns were more often associated with male-dominated occupations. Sentence completion showed the strongest correlation with actual gender distribution, while name generation resulted in a more balanced 'politically correct' gender distribution, albeit with notable variations in predominantly male or female occupations. Overall, the prompting method had a greater impact on gender distribution than the model selection itself, highlighting the complexity of addressing gender bias in LLMs. The findings highlight the importance of prompting in gender mapping.

gender distribution, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2411.09826

Country:

North America > United States > Pennsylvania (0.04)
Europe > Russia (0.04)
Europe > Italy (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government (0.47)
Law (0.46)
Banking & Finance > Economy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting

Hebert, Liam, Sayana, Krishna, Jash, Ambarish, Karatzoglou, Alexandros, Sodhi, Sukhdeep, Doddapaneni, Sumanth, Cai, Yanli, Kuzmin, Dima

arXiv.org Artificial IntelligenceAug-1-2024

Understanding the nuances of a user's extensive interaction history is key to building accurate and personalized natural language systems that can adapt to evolving user preferences. To address this, we introduce PERSOMA, Personalized Soft Prompt Adapter architecture. Unlike previous personalized prompting methods for large language models, PERSOMA offers a novel approach to efficiently capture user history. It achieves this by resampling and compressing interactions as free form text into expressive soft prompt embeddings, building upon recent research utilizing embedding representations as input for LLMs. We rigorously validate our approach by evaluating various adapter architectures, first-stage sampling strategies, parameter-efficient tuning techniques like LoRA, and other personalization methods. Our results demonstrate PERSOMA's superior ability to handle large and complex user histories compared to existing embedding-based and text-prompt-based techniques.

history, persoma, personalization, (12 more...)

arXiv.org Artificial Intelligence

2408.0096

Country:

North America > United States > California > Santa Clara County > Mountain View (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
South America > Colombia > Meta Department > Villavicencio (0.04)
(5 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Leisure & Entertainment (0.68)
Media > Film (0.47)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems

Gomez, Frank Palma, Sanabria, Ramon, Sung, Yun-hsuan, Cer, Daniel, Dalmia, Siddharth, Abrego, Gustavo Hernandez

arXiv.org Artificial IntelligenceJul-10-2024

Large language models (LLMs) are trained on text-only data that go far beyond the languages with paired speech and text data. At the same time, Dual Encoder (DE) based retrieval systems project queries and documents into the same embedding space and have demonstrated their success in retrieval and bi-text mining. To match speech and text in many languages, we propose using LLMs to initialize multi-modal DE retrieval systems. Unlike traditional methods, our system doesn't require speech data during LLM pre-training and can exploit LLM's multilingual text understanding capabilities to match speech and text in languages unseen during retrieval training. Our multi-modal LLM-based retrieval system is capable of matching speech and text in 102 languages despite only training on 21 languages. Our system outperforms previous systems trained explicitly on all 102 languages. We achieve a 10% absolute improvement in Recall@1 averaged across these languages. Additionally, our model demonstrates cross-lingual speech and text matching, which is further enhanced by readily available machine translation data.

llm, preprint arxiv, speech and text, (15 more...)

arXiv.org Artificial Intelligence

2404.01616

Country:

North America > United States (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

Chen, Hanjie, Fang, Zhouxiang, Singla, Yash, Dredze, Mark

arXiv.org Artificial IntelligenceJun-25-2024

LLMs have demonstrated impressive performance in answering medical questions, such as achieving passing scores on medical licensing examinations. However, medical board exam or general clinical questions do not capture the complexity of realistic clinical cases. Moreover, the lack of reference explanations means we cannot easily evaluate the reasoning of model decisions, a crucial component of supporting doctors in making complex medical decisions. To address these challenges, we construct two new datasets: JAMA Clinical Challenge and Medbullets. JAMA Clinical Challenge consists of questions based on challenging clinical cases, while Medbullets comprises simulated clinical questions. Both datasets are structured as multiple-choice question-answering tasks, accompanied by expert-written explanations. We evaluate seven LLMs on the two datasets using various prompts. Experiments demonstrate that our datasets are harder than previous benchmarks. Human and automatic evaluations of model-generated explanations provide insights into the promise and deficiency of LLMs for explainable medical QA.

answer choice, explanation, symptom, (15 more...)

arXiv.org Artificial Intelligence

2402.1806

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > Dominican Republic (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Inducing Group Fairness in LLM-Based Decisions

Atwood, James, Lahoti, Preethi, Balashankar, Ananth, Prost, Flavien, Beirami, Ahmad

arXiv.org Artificial IntelligenceJun-24-2024

Prompting Large Language Models (LLMs) has created new and interesting means for classifying textual data. While evaluating and remediating group fairness is a well-studied problem in classifier fairness literature, some classical approaches (e.g., regularization) do not carry over, and some new opportunities arise (e.g., prompt-based remediation). We measure fairness of LLM-based classifiers on a toxicity classification task, and empirically show that prompt-based classifiers may lead to unfair decisions. We introduce several remediation techniques and benchmark their fairness and performance trade-offs. We hope our work encourages more research on group fairness in LLM-based classifiers.

classification, classifier, fairness, (16 more...)

arXiv.org Artificial Intelligence

2406.16738

Country: Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

palm 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

a526cc8f6ffb74bedb6ff313e3fdb450-Supplemental-Conference.pdf

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs Supplementary Materials Appendix Overview

SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable

Normative Evaluation of Large Language Models with Everyday Moral Dilemmas

Assessing Personalized AI Mentoring with Large Language Models in the Computing Field

Evaluating Gender Bias in Large Language Models

PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting

Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems

Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

Inducing Group Fairness in LLM-Based Decisions